Tackling the Complexity of Context-Free Representations in Example-Based Machine Translation
نویسنده
چکیده
Machine Translation (MT) is seen as a mapping from a source language string into a target language string via an internal representation. I restrict the internal representations to derivation trees that can be generated from context-free grammars (CFG). It is shown that there are more than n! possible representations for an input string of length 1 < n < 20. Given this untractable complexity of internal CFrepresentations, all MT systems which rely on such representations implement a strategy or heuristic to reduce the search space. The paper discusses the Generalized Exemplar-Based MT-system which selects among the possible representations a subset at runtime. An analysis of the computational e ort is provided. 1 Complexity of CF-Representation I assume a universal context-free grammar (CFG) G that produces all possible representations for a given string s. I will examine the number of di erent representations that can be derived from the input string s given the universal grammar G. In the next section, I shall discuss the generalized examplarbased MT-systems and analyse its way to tackle the problem of representational complexity. I shall show that this approach has the potential to generate a universal grammar. A string of length 1 (i.e. the string consists of one word only) may either t a grammar rule in G or else it remains unrecognized. Only in the former case the sting can be reduced and an internal representation can be generated. To examine the number of di erent derivation trees that can be generated from a string of length n > 1, I shall assume that the context-free grammar G is su ciently large such that the whole input string can be reduced into one node only. I will henceforth denote the set of possible representations for a string of length n as R(n) and the cardinality of that set as R (n). The ith element in the set R(n) is denoted by R(n). If a special string s is under consideration I shall write R(s), R(s) and R(s) with the same meaning. A string s of length 1 has thus exactly Figure 1: Four possible representations of the string ab of length 2 that a CFG can generate. Each derivation tree Tj may be generated by an appropriate grammar Gj shown in the lower part of the picture. Ti Tii Tiii Tiv X X X X j j j j (a b) (X b) (a X ) (X X ) j j j j a b a b
منابع مشابه
Inferring Maximally Invertible Bi-grammars for Example-Based Machine Translation
This paper discusses inference strategies of context-free bi-grammars for example based machine translation (EBMT). The EBMT system EDGAR is discussed in detail. The notion of invertible context-free feature bi-grammar is introduced in order to provide a means to decide upon the degree of ambiguity of the inferred bi-grammar. It is claimed that a maximally invertible bi-grammar can enhance the ...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملUsing Complexity and Network Concepts to Inform Healthcare Knowledge Translation
Many representations of the movement of healthcare knowledge through society exist, and multiple models for the translation of evidence into policy and practice have been articulated. Most are linear or cyclical and very few come close to reflecting the dense and intricate relationships, systems and politics of organizations and the processes required to enact sustainable improvements. We illus...
متن کاملApplying KT Network Complexity to a Highly-Partnered Knowledge Transfer Effort; Comment on “Using Complexity and Network Concepts to Inform Healthcare Knowledge Translation”
The re-conceptualization of knowledge translation (KT) in Kitson and colleagues’ manuscript “Using Complexity and Network Concepts to Inform Healthcare Knowledge Translation” is an advancement in how one can incorporate implementation into the KT process. Kitson notes that “the challenge is to explain how it might help in the healthcare policy, practice, and research communities.” We propose th...
متن کاملHierarchical Phrase-based Translation Representations
This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorit...
متن کامل